Start: AIC=1589.64
medv ~ crim + zn + indus + chas + nox + rm + age + dis + rad +
tax + ptratio + black + lstat
Df Sum of Sq RSS AIC
- age 1 0.06 11079 1587.7
- indus 1 2.52 11081 1587.8
<none> 11079 1589.6
- chas 1 218.97 11298 1597.5
- tax 1 242.26 11321 1598.6
- crim 1 243.22 11322 1598.6
- zn 1 257.49 11336 1599.3
- black 1 270.63 11349 1599.8
- rad 1 479.15 11558 1609.1
- nox 1 487.16 11566 1609.4
- ptratio 1 1194.23 12273 1639.4
- dis 1 1232.41 12311 1641.0
- rm 1 1871.32 12950 1666.6
- lstat 1 2410.84 13490 1687.3
Step: AIC=1587.65
medv ~ crim + zn + indus + chas + nox + rm + dis + rad + tax +
ptratio + black + lstat
Df Sum of Sq RSS AIC
- indus 1 2.52 11081 1585.8
<none> 11079 1587.7
+ age 1 0.06 11079 1589.6
- chas 1 219.91 11299 1595.6
- tax 1 242.24 11321 1596.6
- crim 1 243.20 11322 1596.6
- zn 1 260.32 11339 1597.4
- black 1 272.26 11351 1597.9
- rad 1 481.09 11560 1607.2
- nox 1 520.87 11600 1608.9
- ptratio 1 1200.23 12279 1637.7
- dis 1 1352.26 12431 1643.9
- rm 1 1959.55 13038 1668.0
- lstat 1 2718.88 13798 1696.7
Step: AIC=1585.76
medv ~ crim + zn + chas + nox + rm + dis + rad + tax + ptratio +
black + lstat
Df Sum of Sq RSS AIC
<none> 11081 1585.8
+ indus 1 2.52 11079 1587.7
+ age 1 0.06 11081 1587.8
- chas 1 227.21 11309 1594.0
- crim 1 245.37 11327 1594.8
- zn 1 257.82 11339 1595.4
- black 1 270.82 11352 1596.0
- tax 1 273.62 11355 1596.1
- rad 1 500.92 11582 1606.1
- nox 1 541.91 11623 1607.9
- ptratio 1 1206.45 12288 1636.0
- dis 1 1448.94 12530 1645.9
- rm 1 1963.66 13045 1666.3
- lstat 1 2723.48 13805 1695.0
Call:
lm(formula = medv ~ crim + zn + chas + nox + rm + dis + rad +
tax + ptratio + black + lstat, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.5984 -2.7386 -0.5046 1.7273 26.2373
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.341145 5.067492 7.171 2.73e-12 ***
crim -0.108413 0.032779 -3.307 0.001010 **
zn 0.045845 0.013523 3.390 0.000754 ***
chas 2.718716 0.854240 3.183 0.001551 **
nox -17.376023 3.535243 -4.915 1.21e-06 ***
rm 3.801579 0.406316 9.356 < 2e-16 ***
dis -1.492711 0.185731 -8.037 6.84e-15 ***
rad 0.299608 0.063402 4.726 3.00e-06 ***
tax -0.011778 0.003372 -3.493 0.000521 ***
ptratio -0.946525 0.129066 -7.334 9.24e-13 ***
black 0.009291 0.002674 3.475 0.000557 ***
lstat -0.522553 0.047424 -11.019 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.736 on 494 degrees of freedom
Multiple R-squared: 0.7406, Adjusted R-squared: 0.7348
F-statistic: 128.2 on 11 and 494 DF, p-value: < 2.2e-16
Process:
Step 1: Start with All Variables
The full model includes all 13 predictors. The Akaike Information Criterion (AIC) is 1589.64, which measures model quality (lower = better).
medv=β0+β1crim+β2zn+β3indus+β4chas+β5nox+β6rm+β7age+β8dis+β9rad+β10tax+β11ptratio+β12black+β13lstat+ϵ
Step 2: Remove Least Important Variable
The step-by-step procedure evaluates the contribution of each variable. Removing a variable is done if it has no discernible effect on the accuracy of the model.
The first to be eliminated was age (old houses) due to the following: P-value (not statistically significant) = 0.958. Didn’t improve R^2 much better. AIC decreased, indicating a better model fit, to 1587.7.
Step 3: Remove Other Unimportant Variables
indus (non-retail land proportion) was removed because: p-value = 0.738 (too high). AIC improved to 1585.8, meaning the model became simpler without losing accuracy.
At this point, we are left with the 11 strongest variables with indus and age removed. (There can be other weak predictors but they were kept as they improve the model slightly).
🔹 Final Adjusted R² = 0.7348
🔹 Final AIC = 1585.8